Graption: A graph-based P2P traffic classification framework for the internet backbone
نویسندگان
چکیده
1 The authors thank CAIDA for providing this set of traffic traces. Additional information for these traces can be found in the DatCat, Internet Measurement Data Catalog [8], indexed under the label ‘‘PAIX’’. 1910 M. Iliofotou et al. / Computer Networks 55 (2011) 1909–1920 behavior. Flow-level and payload-based classification methods require per application training and will thus not detect P2P traffic from emerging protocols. Behavioral-host-based approaches such as BLINC [25] can detect traffic from new protocols [25], but have weak performance when applied at the backbone [26]. In addition, most tools including BLINC [25] require fine-tuning and careful selection of parameters [26]. We discuss the limitations of previous methods in more detail in Section 4. In this paper, we use the network-wide behavior of an application to assist in classifying its traffic. To model this behavior, we use graphs where each node is an IP address, and each edge represents a type of interaction between two nodes. We use the term Traffic Dispersion Graph or TDG to refer to such a graph [19]. Intuitively, with TDGs we enable the detection of network-wide behavior (e.g., highly connected graphs) that is common among P2P applications and different from other traffic (e.g., Web). While we recognize that some previous efforts [6,9] have used graphs to detect worm activity, they have not explored the full capabilities of TDGs for application classification. This paper is an extension of a workshop paper [18] and the differences will be clarified in the related work section (Section 4). We propose a classification framework, dubbed Graption (Graph-based classification), as a systematic way to combine network-wide behavior and flow-level characteristics of network applications. Graption first groups flows using flow-level features, in an unsupervised and agnostic way, i.e., without using application-specific knowledge. It then uses TDGs to classify each group of flows. As a proof of concept, we instantiate our framework and develop a P2P detection method, which we call Graption-P2P. Compared to other methods (e.g., BLINC [25]), Graption-P2P is easier to configure and requires fewer parameters. The highlights of our work can be summarized in the following points: Distinguishing between P2P and client–server TDGs. We use real-world backbone traces and derive graph theoretic metrics that can distinguish between the TDGs formed by client–server (e.g., Web) and P2P (e.g., eDonkey) applications (Section 2.2). Practical considerations for TDGs. We show that even a single backbone link contains enough information to generate TDGs that can be used to classify traffic. In addition, TDGs of the same application seem fairly consistent across time (Section 2.3). High P2P classification accuracy. Our framework instantiation (Graption-P2P) classifies 90% of P2P traffic with 95% accuracy when applied at the backbone. Such traces are particularly challenging for other methods (Section 3.2.2). Comparison with a behavioral-host-based method. Graption-P2P performs better than BLINC [25] in P2P identification at the backbone. For example, Graption-P2P identifies 95% of BitTorrent traffic while BLINC identifies only 25% (Section 3.3). Identifying the unknown. Using Graption, we identified a P2P overlay of the Slapper worm. The TDG of Slapper was never used to train our classifier. This is a promising result showing that our approach can be used to detect both known and unknown P2P applications (Section 3.4). The rest of the paper is organized as follows. In Section 2 we define TDGs, and identify TDG-based metrics that differentiate between applications. In Section 3 we present the Graption framework and our instantiation, GraptionP2P. In Section 5 we discuss various practical issues. In Section 4 we discuss related work. Finally, in Section 6 we conclude the paper. 2. Studying the TDGs of P2P applications 2.1. Traffic dispersion graphs (TDGs) Definition. Throughout this paper, we assume that packets can be grouped into flows using the standard 5-tuple {srcIP, srcPort, dstIP, dstPort, protocol}. Given a group of flows S, collected over a fixed-length time interval, we define the corresponding TDG to be a directed graph G(V,E), where the set of nodes V corresponds to the set of IP addresses in S, and there is a link (u,v) 2 E from u to v if there is a flow f 2 S between them. In this paper, we consider bidirectional flows. We define a TCP flow to start on the first packet with the SYN-flag set and the ACK-flag not set, so that the initiator and the recipient of the flow are defined for the purposes of direction. For UDP flows, direction is decided upon the first packet of the flow. Visualization examples. In Fig. 1, we show TDG examples from two different applications. In order to motivate the discussion in the rest of the paper, we show the contrast between a P2P and a client–server TDG. From the figure we see that P2P traffic forms more connected and more dense graphs compared to client–server TDGs. In Section 2.2, we show how we can translate the visual intuition of Fig. 1 into quantitative measures that can be used to classify TDGs that correspond to different applications. Data set. To study TDGs, we use three backbone traces from a Tier-1 ISP and the Abilene (Internet2) network. These traces are summarized in Table 1. All data are IP anonymized and contain traffic from both directions of the link. The TR-PAY1 and TR-PAY2 traces were collected from an OC48 link of a commercial US Tier-1 ISP at the Palo Alto Internet eXchange (PAIX). To the best of our knowledge, these are the most recent backbone traces with payload that are available to researchers by CAIDA [5]. The TR-ABIL trace is a publicly available data set collected from the Abilene (Internet2) academic network connecting Indianapolis with Kansas City. The Abilene trace consists of five randomly selected five-minute samples taken every day for one month, and covers both day and night hours as well as weekdays and weekends. Extractingground truth.We used a Payload-based Classifier (PC) to establish the ground truth of flows for the 1 2 2 1 0 3 1 9 4 3 3 4 5 4 8 1 4 8 2 9 3 4 1 4 6 1 1 0 7 1 1 5 5 5 6 6 3 4 7 4 5 1 0 5 8 1 1 6 9 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 8 6 3 1 0 3 2 4 4 1 8 1 7 1 8 1 9 2 0 2 1 2 2 2 3 2 4 7 7 1 1 0 9 6 2 5 2 6 4 1 2 4 6 3 2 7 2 8 2 9 3 0 3 1 3 2 3 3 3 4 3 5 3 6 1 0 1 3 7 3 8 7 3 1 6 6 2 0 6 2 3 2 3 4 2 3 4 5 4 0 9 4 8 6 5 3 0 6 3 0 6 3 5 7 3 4 7 4 8 7 4 9 7 9 4 7 9 6 8 5 2 8 5 5 9 1 6 1 0 7 0 1 1 3 4 1 2 4 2 3 9 4 0 8 1 8 5 1 1 6 1 2 3 1 3 0 1 3 2 1 3 5 1 3 9 1 4 8 1 6 1 1 6 4 1 6 5 1 8 6 1 9 1 2 2 5 2 2 7 2 3 5 2 4 3 2 5 7 2 6 8 2 9 2 3 0 3 3 1 3 3 1 6 3 4 9 3 5 0 3 6 2 3 7 6 3 8 3 4 0 3 4 1 3 4 1 4 4 4 5 4 5 5 4 6 1 4 7 7 4 9 3 5 1 7 5 2 4 5 3 7 5 7 0 5 9 1 6 1 4 6 1 9 6 4 0 6 5 7 6 6 6 6 6 8 6 7 4 7 3 1 7 3 6 7 3 7 8 0 0 8 0 7 8 0 8 8 3 1 8 3 9 8 4 1 8 4 4 8 5 6 8 7 0 8 7 6 8 8 8 8 9 2 8 9 8 9 0 0 9 0 8 9 1 4 9 1 5 9 2 5 9 7 4 1 0 0 0 1 0 0 5 1 0 0 9 1 0 1 8 1 0 2 2 1 0 2 7 1 0 3 5 1 0 5 6 1 0 9 2 1 0 9 9 1 1 0 3 1 1 0 5 1 1 0 6 1 1 2 8 1 1 3 3 1 1 6 5 1 1 6 6 1 1 7 0 1 1 8 3 1 1 8 6 1 2 0 9 1 2 5 6 1 2 6 7 1 2 7 2 1 2 8 8 1 2 8 9 1 2 9 1 1 3 3 4 1 3 5 2 1 3 7 2 4 1 4 2 3 0 9 3 2 0 3 5 6 6 0 2 6 3 2 7 1 4 8 7 2 1 0 6 5 4 3 4 4 4 5 4 6 1 1 5 4 7 4 8 2 5 6 7 7 2 4 9 5 0 1 3 7 1 6 0 1 8 1 5 4 5 7 1 5 7 4 1 1 0 0 4 5 1 5 2 5 3 5 4 1 4 2 1 0 0 7 5 5 5 6 5 7 2 9 1 7 6 0 8 8 3 5 8 5 9 6 0 6 1 2 7 7 4 3 2 6 2 6 3 2 9 9 5 7 2 7 3 5 1 3 7 0 6 4 6 5 9 0 2 6 1 2 8 2 3 4 1 5 8 2 6 3 3 1 3 6 4 6 6 6 7 9 2 9 1 0 8 0 1 0 8 2 6 8 7 5 5 1 1 3 5 1 2 2 8 6 9 7 0 7 1 2 7 0 7 2 7 4 7 5 7 6 7 7 7 8 7 9 8 0 8 2 8 3 7 2 3 8 4 8 7 3 6 1 8 8 8 9 1 7 5 9 1 9 2
منابع مشابه
Graption: Graph-based P2P Traffic Classification at the Internet Backbone
Abstract—Monitoring network traffic and classifying applications are essential functions for network administrators. Current traffic classification methods can be grouped in three categories: (a) flow-based (e.g., packet sizing/timing features), (b) payloadbased, and (c) host-based. Methods from all three categories have limitations, especially when it comes to detecting new applications, and c...
متن کاملGraption: Automated Detection of P2P Applications using Traffic Dispersion Graphs (TDGs)
Monitoring network traffic and detecting emerging P2P applications is an increasingly challenging problem since new applications obfuscate their traffic. Despite recent efforts, the problem is not yet solved and network administrators are still looking for effective and deployable tools. In this paper, we address this problem using Traffic Dispersion Graphs (TDGs), a novel way to analyze traffi...
متن کاملGraph Based Classification of Content and Users in BitTorrent
P2P downloads still represent a large portion of today’s Internet traffic. More than 100 million users operate BitTorrent and generate more than 30% of the total Internet traffic [7]. Recently, a significant research effort has been done to develop tools for automatic classification of Internet traffic by application [9, 8, 11]. The purpose of the present work is to provide a framework for subc...
متن کاملDesign of Efficient Caching Algorithms for Peer to Peer Networks to Reduce Traffic in Internet
Peer-to-peer (P2P) file sharing systems generate a major portion of the Internet traffic, and this portion is expected to increase in the future. We explore the potential of deploying proxy caches in different autonomous systems with the goal of reducing the cost incurred by Internet service providers and alleviating the load on the Internet backbone. P2P traffic is more complex than caching ot...
متن کاملFile-sharing in the Internet: A characterization of P2P traffic in the backbone
Since the outbreak of peer-to-peer (P2P) networking with Napster during the late ’90s, P2P applications have multiplied, become sophisticated and emerged as a significant fraction of Internet traffic. At first, P2P traffic was easily recognizable since P2P protocols used specific application TCP or UDP port numbers. However, current P2P applications have the ability to use arbitrary ports to “c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computer Networks
دوره 55 شماره
صفحات -
تاریخ انتشار 2011